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Note 1 : These are the first version of suggested lecture notes for a sec- 
ond level course on advanced topics in database systems for 
master's students of Computer Science. A prerequisite in al- 
gorithms and an exposure to database systems are required. 
Additional reading may require exposure to mathematical logic 
which seems to be getting outdated going by newer editions of 
Computer Science theory books. 

Note 2: These notes are from M.Y.Vardi's survey listed as reference 
[7]. This select rewrite on functional dependency is intended 
to provide a few clarifications while avoiding topics relating to 
logic. 
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1 Introduction 

Manipulation of a large store of structured information has been the fundamental re- 
quirement in many computer-based applications which has evolved into database systems 
and has promoted the associated technologies in the West. A database management sys- 
tem is now understood to be a computer-based system maintaining a large amount 
of permanent data appertaining to a real-world organization/institution together with 
mechanisms to search, add, update, delete data and with mechanisms for administartive 
control such as granting/revoking privileges, defining views for restricted access and 
archicving. In the business domain relational database story has been a success due 
to cost-effectiveness and the support of a sound formalism in organizing and managing 
structured data. IBM's pioneering implementation during the 1970s of these concepts 
was System R. As apparent from the contemporary literature relational databases still 
form the core in most of the database management systems since their introduction more 
than three decades before. 

Conceived in the late 1960's, the relational model of databases views a database as a 
collection of relations where each relation is a set of well-defined tuples. A relation 
is synomymous with a table whose columns are named by attributes. This notion of 
databases apparent from the work of E.F.Codd in 1970's is founded on the following two 
principles: 

(a) All information pertaining to an application are captured as data values in relations 
or tables. 

(b) No information is represented by ordering of columns or rows of any table. 

Searching, adding, deleting and updating of data is effected by manipulations of rela- 
tions by relational algebra having a procedural flavor or by relational calculus having 
a declarative flavor. Codd's theorem states that any relational algebra expression can 
be converted efficiently to an equivalent relational calculus expression and vice versa. 
Here efficiency is interpreted to mean that there exists a conversion algorithm whose 
running time is bounded by a polynomial in the size of the input expression. The re- 
lational model is almost devoid of semantics. Therefore meaningful relations in a given 
context are understood by specifying semantic or integrity constraints. In particular the 
notion of functional dependency introduced by Codd in 1972 is of significance in practice 
and the related notion of implication apparent from the work of P.A.Bernstein in now 
considered fundamental. Subsequently in 1976 multivalued dependency was introduced 
by R. Fagin and C. Zaniolo independently, opening up the way for data dependency 
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analyses. Much of theory about the relational model during the 1970's and first half 
of 1980's focussed on query processing and optimization and database design including 
dependency analysis (see [S]). Another studied problem relating to integrity constraints 
that allow permissible data in relation instances is this: assuming that the currrent data 
values satisfy the constraints before an update, how to efficiently check (or decide that 
no checking is necessary) that the constraints hold after the update? 
This condensed survey is essentially a select rewrite of M.Y.Vardi's survey [7] incor- 
porating more explanations as needed. Further pertinent technical details, associated 
concepts such as other types of dependencies and their significance in the real-world, in- 
tractability results, relationships to mathematical logic and combinatorial problems such 
as constraint satisfaction and many original references can be found in the database lit- 
erature from the 1970's (see for example IH [3, El IS] ) . 

2 Preliminaries 

In the sequel, it is understood that the implicit context is a given real-world application. 
We use I, J,K, . . . to denote the different tables that together form a database. The 
set X of all attributes in a relation / is referred to as a relation scheme. We then say 
/ is defined over X. By convention we denote by U the set of all attributes occuring 
across all tables comprising the database. The headers A, B,C, . . . denote attributes. 
The tailenders R, S, . . . , X,Y, Z denote sets of attributes. For the sake of convenience 
we make no distinction between {A} and A. Given the attribute sets X and Y, XY 
will denote X UY; ACE is a shorthand for {A,C,E}. Associated with each column 
of a table is a domain of values from which the entries are taken in the column. That 
is, for each attribute we have a finite or infinite set Dom{A). In the relational model 
the elements of Dom{A) are assumed to be atomic in the usual sense. We denote DoM 
= IJ^. Dom{Aj) where U = {Ai, . . . , An}. We recall that by tuple we mean a row in a 
table, that specifies an appropriate value for each attribute. By u,v,w,. . . we denote 
tuples. Given an attribute set X, a tuple m on X is then a mapping u : X ^ DoM such 
that for each A, u{A) G Dom{A). \X\ can be as large as the arity or the total number 
of attributes in the considered relation. If v is a tuple on X then v[Y] will mean the 
restriction of w to y where Y (Z X. We take v[X] = v. 

2.1 Projection and join 

Let X be a relation scheme and / a relation on X. Given y C X we define the projection 
7ry(/), a relation as 7ry(/) = | G /}. Let /i, . . . , be relations on Xi, . . . ,Xk 

and let X = u'^^^Xj. Then we define the join Ji xi . . . m 1^ abbreviated to Xj=i as 

h = {"'[^l I "'ra ^ ^3 1 < J < k}. 
Projection builds a new relation from a given one by selecting one or more attributes. 

Join combines tuples from two or more relations when they agree on common columns. 
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Join is commutative as well as associative. In some sense projection and join arc duals. 
Beginning with the following relations the examples below illustrate that projection and 
join cannot always be regarded as inverses. 

I: J. K: L: M: N: 

ABC ABC AB BC ABC AB 



000 000 00 00 000 00 

101 101 01 01 001 

1 

10 



We have 7Tab{I) x ^bc{I) = I and tiab{J) ^ t^bc{J) = I- Also K H L = M while 

t^ab{M) = N and -Kbc^M) = L. 

Generalization to the following lemma is immediate. 

Lemma 1: 

(a) Let / be a relation on X. Let Xi, . . . , be attribute sets such that X = U^^Xj. 
Then /C >^™, nx, (/). 

The simultaneous projection of / onto Xi, . . . , X^ is referred to as a decomposition. The 
decomposition is lossless when / = ^^=\ ^Xj{I)] otherwise it is lossy. 

(b) Let /i, . . . , be relations on Xi, . . . , X^, respectively. Then ( ix^Li Ik) ^ Ij- 
Remark: It is appropriate to interpret lemma 1 for legal relations i.e., when the relations 
are meaningful w.r.t. a given real-world scenario. More details follow. 



3 Functional dependency 

The presence of redundancy and anomalies in instances of relations and the natural 
requirement to do away with them has motivated the dependency theory of relational 

databases and hence database design. We first recall what arc commonly referred to as 
Codd's anomalies in a relation by considering the following example. 



STUDENT DEPARTMENT SUPERVISOR 



Alice 

Bob 

Carol 

Darrel 

Frank 



Cryptology 
Cryptology 
Graph Theory 
Graph Theory 
Graph Theory 



John 

John 

Yohann 

Yohann 

Yohann 



The cited problems are: 
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(a) Redundancy: That John is a supervisor for Cryptology or Yohann is a supervisor for 
graph theory can get repeated in many tuples. 

(b) Potential inconsistency: In the graph theory department if Carol get a new su- 
pervisor does it mean that the department gets two supervisors or is the intended 
meaning to change the supervisor to the new supervisor for all students in graph 
theory department ? 

It stands to reason that there is a functional dependency between DEPARTMENT and 
MANAGER - this is a kind of semantic constraint on the data that comprise legal re- 
lations. In this case we say that DEPARTMENT determines SUPERVISOR and write 
DEPARTMENT — > SUPERVISOR. 

In formal terms, for attribute sets X,Y X — > y is a functional dependency (fd) over a 
relation scheme R i.e., XY C i? if for all tuples u,v & I u[X] = v[X] =^ '^[y] = v[Y], 
where / is any legal relation on R. We say / satisfies a set of fd's S if J satisfies all fd's 
in E.. 

Two notions namely equivalence and redundancy of fd's are useful in the context of 
manipulating fd's mechanically. Two sets of fd's A and S are equivalent written S = A 
if they are precisely satisfied by the same set of legal relations. In other words / satisfies 
A <^=^ / satisfies S. A set S of fd's is redundant if there is a A C S such that A = S. 
These can be expressed by a more fundamental notion viz., implication. We write S |= a 
to say that a set S of fd's implies an fd a. By this we mean that any relation that 
satisfies S necessarily satisfies a. For example {B — y C, A — y B} \= A — y C. The 
database implication (or inference) problem for fds is: given S that necessarily holds for 
any legal instance of a database and given any a does S |= a? 
In terms of implication redundancy and equivalence can be stated as 

• S is redundant iff there is an fd a G S such that S — {a} |= cr. 

• A = SiffA|=o" for any cr G S and S |= 5 for any 5 G A. 
Remarks: 

1. \i X r\Y = (j) then the fd X — y Y may be interpreted clS Qb CclSG of multivalued 
dependency (see |1] for example). 

2. Let I be any relation on R. Then an fd X — y Y where XY G R is satisfied by / iff 
'^xy{I) satisfies X — y Y. 

3. It has been shown that fds can be interpreted as formulas in propositional calculus. 
To this end it is sufficient to interpret an fd like Ai . . . Ak — y B as an equivalent Horn 
formula and then take note of the fact that for Horn formulas satisfiability can be tested 
in polynomial-time. In turn this implies the existence of a polynomial-time algorithm 
for the implication problem. 

In the above finite as well as infinite relations are allowed though in practice we need 
to consider only finite relations. Written as S |=/ cr, S finitely implies a if any finite 
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relation / that satifies E satisfies a as well. Surely if S |= cr then E cr. 
Fact 1: Implication and finite implication coincide for fds. 

An fd X — > Y is said to be reduced if there is no proper subset W G X such that 
E 1= — y Y. We say E is reduced if every fd in it is reduced. The algorithm that 
follows outputs a reduced equivalent to a given set E of fds. 

Algorithm REDUCED{T) 
begin 

E 

for {each fd X — > Y in A) do 

for {each attribute A in X) do 

if A 1= X - A — >Y 

then X ^^ X - Ain X — >Y 
return A 
end 

Algorithm REDUCEDiT?) depends on a test for implication of fds which determines its 
complexity. 

3.1 Formal system for functional dependencies 

A formal system for fds comprises a set of axioms and inference rules - this was first 
studied by W.W.Armstrong in 1974 when the significance of implication was not ap- 
parent. Armstrong's system denoted by F-A consists of one axiom and three inference 
rules. 

FDAO: (Refiexivity) h X — > X. 

FDAl: (Transitivity) X — >Y,Y — > X — > Z. 

FDA2: (Augmentation and projection) X — )■ Y \- W — )■ Z li X QW and 
ZQY. 

FDA4: (Union) X — >Y,Z — XZ — > YW. 

Describing F-A requires the notion of a derivation - a derivation of a from E is denoted 
by E h (7. In a formal system such as F-A, given a set E of fds and an fd cr, by a 
derivation of cr = cti, . . . (T„ we mean: each (7j (1 < z < n) is either an instance of an 
axiom scheme or it follows from the preceding dependencies in the sequence by one of 
the inference rules. Soundness and completeness of any system like F-A are expressed 
as 

(1) F-A is sound if E |= cr is a necessary consequence of E h cr. 

(2) F-A is complete if E h cr is a necessary consequence of E |= cr. 
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The formal system FD given below is as in [7]. FD consists of FDl, FD2 and FD3. 

FDl: (Reflexivity) h X — xj). 

FD2: (Transitivity) X — >Y,Y — > X — > Z. 

FD3: (Augmentation) X — >Yh XZ — >YZ. 

Theorem 1: FD is sound and complete. 

Proof: Soundness and completeness are proved separately as given below. 
Soundness: As FDl is vacuously true it is sufficient to show that individually FD2 and 
FD3 are sound. Let / be a relation on let X, Y, Z be drawn from R and it, f G / be 
any two tuples. First, considering FD2, assume that / satisfies X — )■ Y and Y — )■ Z. 
Then (using the definition of fd) if u[X\ = v[X] since u[Y] = v[Y] we have u[Z] = v[Z]. 
That is / satisfies X — y Z and hence FD2 is sound. Next, considering FD3, assume 
that / satisfies X — >Y. If u[XZ] = v[XZ] then u[X] = v[X] and u[Z] = v[Z]. As / 
satisfies X — y Y we have u[Y] = v[Y] and so u[ZY] = v[ZY]. In other words / satisfies 
XZ — y YZ and thus FD3 is sound. 

Completeness: Let S be the set of fds that any legal relation I on R needs to satisfy. If 
a = X — y Y be any given fd, completeness requires that if S |= cr then FDl, FD2 and 
FDS are sufficient to conclude S h cr. This amounts to proving the contrapositive viz., 
if S 1/ cr then E ^ cr. In the context S we define the closure X~^ of X as X'^ = {A \ S h 
X — y A}. By FDl, h X — y cp; now invoking FDS taking Z as any A G X we have 
h X — y A. Hence X C X~*". If X~^ = X then any relation / that satisfies E clearly 
satisfies any given fd X — y Y. Hence we need to consider the case where X C X^. The 
following holds due to FD2 and FD3: W — y Zi,W — y Z2 ^ W — y Z1Z2. Repeated 
use of this yields S h X — y X"*" since S h X — y A by definition, for all A G X+ 
and since X C X"^. If as assumed S 1/ a then we claim that Y (f_ X+. If possible let 
the contrary hold viz., Y C X"*". In such a case we can use FDl and FD3 to show that 
E h X"*" — y Y . Combining this with the fact S h X — y X+ by virtue of FD2 we get 
S h X — y Y which is a contradiction. Therefore there exists some B & R such that 
B E Y but B ^ X+. We construct a specific legal relation / consisting of two tuples 
u,v. We prescribe that u[A\ = v[A] iff ^4 G X+. In more details let u[A] = a for all 
Ae R and v[A] = 13 for any A e R X+. As X C X+ u[X] = v[X] but by construction 
u[Y] 7^ v[y]- So in / X -/-^ Y. We now show that / satisfies S. Let S — y T be 
any fd in S. Indeed if I is legal then if u[S] = v[S] then we should be able to show 
u[T] = v[T]. So suppose that u[S] = v[S]. Then by a previous argument S C X+. 
Using FDl and FD3 X"*" — y S. By the assumption S — y T, with FDl we can conclude 
S — ^ T h X+ — yT. For any y4 G T by FDl h T — y A. Then by FD2 it follows that 
for all ^4 G T X"*" — y A which implies that S h X+ — y T. Therefore T C X+ and by 
construction u[T] = v[T]. Therefore I satisfies S. Thus / is one relation that satisfies S 
but it does not satisfy X — Y i.e., S ^ a. ■ 
Remarks: 

1. The counter-example constructed in the proof above is finite. It then follows that in 
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the case of fds implication and finite implication coincide. 

2. The polynomial-time algorithm for implication of fds due to C.Beeri and P.A.Bernstein 
depends on the efficient construction of the closure w.r.t. E (otherwise read E-closure 
of X) also denoted by cIy,{X). 

Lemma 2: Let E be a set of fds to be satisfied by all legal relation over R. Let X,Y & R. 
Then the following holds. 

E 1= X — >Y Y C clj^{X). ■ 
Thus in order to test if a given fd X — > Y is implied by E it is sufficient to build cIy,{X) 
and see if F C ch{X). We caU an fd X — >Y closed iiY = ch{X). A set A of fds is 
closed if every fd in it is closed. The case |F| = 1 is noteworthy. An fd X — > Y is said 
to be in canonical form if |y| = 1. Any fd X — )■ Y can be converted to a set of fds in 
canonical form in view of the following lemma. 

Lemma 3: For 1 < j < A; and F = . . . A^, AT — >Y^X — > Aj and {X — > 
A,] \=X^Y. 

3.2 Computing the closure 

The following algorithm CLOSU RE^J^, X) takes as input a set of fds E and an attribute 
set X and outputs cIy,{X). 

Algorithm CLOSU RE{T,, X) 
begin 

Y ^ X 

while {there exists an fdS — > T such that S C.T and T %Y) do 

Y ^YT 
return Y 
end 

Lemma 4: Algorithm CLOSURE(E,X) correctly outputs c/s(X). 

Proof: The algorithm terminates after examining a finite number of fds. By induction on 

the steps of the algorithm we first claim that starting from initialization till termination 

Y C cls{X) holds. As Y is initiahzed to X the claim is true initially since E |= X — > A 
for all A & X. Let the claim be true at some intermediate stage during the execution 
after which the algorithm considers an fd S — > T from E such that S ^ Y. Then 

Y — > S and since S — > T we have Y — > T. By induction hypothesis X — > Y and 
so X — > T and we have X — > YT. That is E |= X — > YT. Therefore the claim is 
true upon termination of the algorithm. 

We further show that upon termination it is impossible to have Y C cZs(X). If possible 

let the contrary hold. That is let B G clj]{X) but B ^ Y. Since B G c/s(X) we have 
E 1= X — > B. We now build a relation / that is legal. Let / consist of two tuples u, v 
such that u[A] = v[A] iff A G F. To show that / satisfies E we assume the contrary if 
possible. Let 5" — > T be an fd causing violation. Then when this fd was considered 
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by the algorithm we should have had S Y but T ^ Y. However in such a case the 
algorithm should have been at some intermediate state of execution. Therefore / satisfies 
S but by construction / does not satisfy X — > B. In symbols S ^ X — )■ B which is 
a contradiction. Hence it follows that Y = clj:{X). ■ 
Remark. It follows that CLOSURE{Ti, X) can be implemented efficiently. A theorem 
due to CBeeri and P.A.Bernstein asserts that the implication problem for fds is solvable 
in time linear in the length of the input. 

3.3 Covers 

Given two sets of fds A and S, we say one is a cover for the other if A = S. We can speak 
of minimal covers in the sense that for any other cover F of S a cover A is minimum 
if the number of fds in A is not greater than that in F. With an efficient algorithm for 
implication we can efficiently determine equivalence, redundancy and a non-redundant 
cover. Correctness of the following algorithm NONREDUND(T?) is evident. 

Algorithm NONREDUND{J:) 
begin 

A ^ S 

for {each fd a in A) do 
if A - {o-} H o- 
then A = A - {cr} 
return A 
end 

The above algorithm does not necessarily find the minimum covers. Let S be a set of 
fds. The following theorem from [6] (which has a stronger version) is stated without 
proof. 

Theorem 2: Let A be a non-redundant cover for S. If A is closed then it is minimum. 

■ 

The following simple but non-trivial algorithm MINCOV ER{T?) takes as input a set S 
of fds and outputs a minimum cover for S. 

Algorithm MINCOVER{T.) 
begin 

A ^ S; 

for each a = X — > F G S do 
begin 

A ^ A - cr 

if r 2 CLOSURE{A,X) then 
begin 

Z ^ CLOSURE{A, Y) 



Mar. 2011 



8 



Functional dependency 



Relational databases 



A ^ A U {X — > Z} 
end 
end 

return A 
end 

Algorithm MINCOV ERiTs) scans through all the fds in S. On encountering each a 
the algorithm removes a from A and thus updates A. It checks if the removal is safe 
in which case the updated A will be equivalent to S. Otherwise it updates A so that 
after the updation A = S. The later updation ensures that the newly added fd is closed. 
Theorem 2 assures the correctness of M I N GOV ERiJl) . The time complexity of the 
algorithm can be estimated as 0(|E| x r) where r is the time to find closure. 
Given a set S of fds, by CANONICALiT?) we denote a canonical cover of S i.e., a cover 
for E such that for any fd a G CANONICALiTj), a is in canonical form. 

4 Database schema design 

The principal goal in database schema design is how to design a set of relation schemes to 
constitute a database and how to indicate their meaningfulness by specifying appropriate 
constraints such as fds. Intuitively it appears that there is a trade-off between updating a 
database versus querying it - smaller relational schemes are easier to update while queries 
on them them may be harder to process. Past research along these lines have focussed on 
arriving at acceptable ways of grouping attributes into tables and on obtaining normal 
forms [3]. The criteria for acceptance of a design is preservation of information and 
suggested dependencies and elimination of redundancy. 

In formal terms, a relation scheme is a 2-tuple (i?, S) where i? C f/ and S are 
respectively a relation scheme and an associated set of fds over R. Then a database 
schema P is a set of relation schemes i.e., V = Si), ■ ■ ■ , {Rk, ^k)} where U = U^^^ 

It is also convenient to define S = U^^^Sj. Finally a database B over V is an assignment 
of a meaningful relation to each relation scheme in each 2-tuple in V. The primary 
objective in database schema design is that problems such as Codd's anomalies should 
not exist during database operations w.r.t tuples like additions, deletions and updations 
- this concern has resulted in what are called as normal forms of database schemas. 

4.1 BCNF 

For a relation scheme R and attribute set X C i? is a determinant of R if there exists 
at least one attribute A & R — X such that S |= X — A. If for all A & R we have 
E ^ X — > A then X is called a superkey of R. If X is a superkey and further if for 
any B&X,Tj\=X — B -/-^ A then X is called a key of R. Using an algorithm for 
implication, the following algorithm constructs keys for a relation scheme {R, S). 
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Algorithm KEY{R, E) 
begin 

X ^ R 

for {each A E R) do 
ifE^X-A — > R 
then X ^ X -A 
return X 
end 

The X returned by the algorithm depends on the sequence of testing for redundancy the 

different A's. 

Remark: The problem of finding all the keys for a given relation scheme is A^P— Complete 
We now state the Boyce-Codd normal form (BCNF) probably what is referred to as 
the strongest of all normal forms sought after in database schema design. Intuitively the 
permitted fds in any legal relation in BCNF are all due to keys. 

Definition of BCNF: V satisfies BCNF if whenever X is a determinant of R then X 
is a superkey of R where is a part of any relation scheme of V. 

4.2 Normalization via decomposition 

Let R = ABODE be a relation scheme such that for any meaningful relation on R the 
fd E — > CD holds. Consider the decomposition of R as ABE and CDE. The following 
relation / on i? is built by ensuring that E — > CD holds and by randomly filling values 
for A and B from their underlying domains. 

J. 

A B C D E 



1 
10 1 
10 110 
11110 



We can check that 'Kabe{I) * t^cde{I) — I '^^ always true. This isn't a coincidence - 
the presence of fds can guarantee non-lossy decompositions as asserted by the following 
theorem. 

Theorem 3: Let / be defined on i? = XY Z such that / satisfies X — )■ Y . Then the 
decomposition of I into t^xy{I) and 'Kxzil) is lossless. 

Proof. Let J = 7rxy(/) t^xz{I)- In general I C J. It is therefore sufficient to prove 
J ^ I. This is done by showing that u E J =^ u E I for any u E J. By the definition 
of J, u[XY] E Tixvi.!) and u[XZ] E i^xzi.!)- As 7rxy(/) and Tixzi.1) are obtained 
from /, there are two tuples v,w E I such that u[Xy] = ^[Xy] and u[XZ] = w[XZ]. 
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Consequently v[X] = w[X] and since X — > Y it follows that = f [Xy] = ^[Xy]. 

We reason w[XYZ] = u[XYZ] i.e., w = u. ■ 

To implement normalization via decomposition assume that V is not in BCNF 
where D is a database schema defined as above.. W.l.o.g. assume that the relation 
scheme {Rj,Ilj) is the cause for violation. For an attribute set X we define 7rx(S) = 
{S — > T\S — > T e X and ST C X}. Since {Rj, S^) is not in BCNF there exists 
an X that is a determinant of Rj but that isn't a superkey for Rj. Therefore there 
is an attribute A & Rj — X such that E |= X — y A where S = U^^^Hj. In the 
decomposition process we invoke theorem 3 and replace {Rj, Sj) by Di = (irxAiRj), Sj) 
and D2 = {TTR.-AiRj),'^]) where T,^=7ixA(X'j) and S^=7rij^._A(Sj) and continue further 
the process of decomposition if D2 is not in BCNF. 

Let V be as defined above and let £ = {U, S), also referred to as a universal schema. 
As a part of the design process we would like to find when does it make a reasonable 
sense to say that V represents £. It perhaps follows intuitively that we need these two 
conditions viz., (a) there should be no loss of information if relations are stored using 
schema V rather than as schema £ and (b) all the S should logically imply the fds in all 
the E^s and together the S^s should logically imply S. This amounts to the requirement 
that a decomposition from £ toV should be lossless and dependency preserving. Formally 
P represents £ if the following conditions hold together. 

(i) Let / be a relation on U satisfying S. Then for j = 1, . . . k the decomposition of / 
into TTR^^iys is lossless. 

(ii) Let A = U'^^^Ej. Then E |= A and A |= E. 

The decomposition in theorem 3 is such that (i) is guaranteed but only half of (ii) is 
satisfied. Unfortunetely in general it appears that it not possible to efficiently find a 
decomposition resulting in BCNF that satisfies both (i) and (ii) above. This is asserted 
by theorem 4 that follows. 

4.2.1 Checking for BCNF violations 

We begin with the hitting set problem which is XP— complete (problem [SP8] on p. 222 
of i5j). 

Hitting set: Let T = {Ai, . . . , An} and let Bj c T for j = 1, . . . , m. The problem 
asks to find if possible a hitting set W ^ T such that for each j, \W (1 Bj\ = a where 
a > 1. 

In the following example a = 1. For a generalized hitting set problem see The 
generalized version models known problems such as vertex cover. 

Ex. Setting T = {pi, . . . ^ps}, let Bi = {^1,^2,^3}, ^2 = {^2,^3,^4}, ^3 = {^1,^7,^8}, 
-^4 = {p^yPGjPr}- We interpret T to be a set of persons and we set that a task tj requires 
for its completion skills available with any person in Bj. It is required to find a group of 
persons from T who can complete all the tasks subject to the constraint that only one 
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person is selected from each Bj. We may require finding (i) a so as to maximize or 
minimize or (ii) Wi and W2 where possible so that Wi fl W2 = 0. 
The problem of determining if a given database schema violates BCNF is a hard prob- 
lem. The following proof relies on the hardness of Hitting set with a = 1. 

Theorem 4: [C.Beeri and P.A.Bernstein] Let P be a given database schema. It is 
A^P-complete to check if there is a BCNF violation in T>. 

Proof: That the problem belongs to the class NP is clear from [7J. Following [7] we 
show that the problem is iVP— hard. That is, we reduce the hitting set problem to the 
problem at hand. More specifically, the proof shows that each instance of the hitting set 
problem can be mapped in polynomial-time to a database schema such that there exists 
a hitting set iff the produced schema violates BCNF. 

Retaining the notations of the hitting set problem let U = {Ai, . . . , An, Bi, . . . , Bm, C, D}. 
We build a database schema V consisting of the following relation schemes. 
Dl. For every pair Ai, Bj such that Ai G Bj we include in V {AiBj, [Ai — y Bj}. 
D2. We include in I? {Si . . . B^CD, {Bi . . . B^ — > C)}. 

D3. Finally we include mV {Ai... A^CD, {{CD — > A^ . . . A^} U{AiAj — ^ CD if i ^ 
j and both Ai, Aj belong to B^. for some k})}. 

It is not difficult to reason that with Dl, D2 and D3, V can be constructed in polynomial- 
time. Let W = {Aai, ■ ■ ■ ,Aa^} C T be a hitting set. Then for every Bj W H Bj = 
^«fc; 1 < A; < r. By Dl this necessarily means Aa^. — > Bj. We can conclude, for every 
i, \= W — > Bi. Using lemma 4 with D2 we then have E |= 14^ — C. So in D3 W is 
a determinant oi Ai . . . AnCD. We now make the following claim 4.1 which implies that 
W is not a superkey for the relational scheme in D3. 
Claim 4.1: ch{W) = WBi . . . B^C where S is the set of all fds in V. 
To establish the claim we note that from CLOSUREill, X) it follows that it is sufficient 
to show that for every fd S — )■ T E either S ^ clj:{W) or T C clj]{W) holds. This 
follows by considering Dl, D2 and D3 separately. 

Conversely let V be not in BCNF. Then one or more of Dl, D2, D3 should contain 
a violation of BCNF - we find that only D3 can cause a violation. From D3 we first 
observe that for any i,j {i 7^ j) if Ai G Bk and Aj G Bk then AiAj — > Ai . . . An. Let 
W C Ai ... AnCD be a determinant but not a superkey. Clearly C, D ^ W . So let 
W O Ai . . . An. Because of the observation above W cannot contain two distinct Ai, Aj 
belonging to some B^. That is W consists of Ai^s such that every such Ai G Bj and 
there exists no other Aj G Bj. Let W = {Aa^, . . . , Aa^} so that each Aa^ embraces some 
Bp. If all the Dj's are not embraced by our choice of W let there be a Bi such that there 
exists Aq E Bi as Bi C T but Ag ^ W. We then update the current W by including Ag 
and do not further reckon any Bg if Ag E Bg. This way we can expand W embracing all 
possibly left out Dj's so that it becomes a hitting set. ■ 
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4.3 3NF and normalization via synthesis 

We assume "D is a database schema as defined above. Starting from T> instead of obtaining 
a desirable database schema in BCNF representing V which may not be feasible in some 
cases it is possible to settle for a weaker normal form referred to as the third normal 
form {?)NF). The definition of ?)NF can be given in terms of a strong determinant. An 
attribute A G Rj is called prime if there is a key Z of Rj such that A & Z. Z is a 
strong determinant of Rj if Z C Rj and there exists a nonprime A E Rj — Z such that 
S 1= Z — > A. Here it is not necessary for Z to be a superkey. 

Modern definition of 3NF: V is in "iNF if whenever Z is a strong determinant for 
any Rj in V then Z is a superkey of Rj. 

The problem of determining whether a given relation scheme {Rj, Ej) is in 3NF can be 
done in one of the following two ways. 

1. Show that for all fd X — > A E Aj, X is a. superkey or A is prime, where = 
CANONICAL{i:j). Conclude that 3A^F is not violated. 

2. Show that there exists an fd X — y A G Aj such that X is not a superkey and A 
isn't prime, where Aj is as in 1. Conclude that 3A^F is violated. 

Remarks: 

1. First we note that BCNF implies 3NF. Let {Rj, S^) be in V and let X — > Ae S^-. 
If ^4 G Rj is prime then there is at least one key Y C Rj such that A eY . Assume that 
we decompose Rj as nxAiRj) and 7VR^-A{Rj)- Then the fd Y — > Rj is lost since y is a 
key and A E Y. Therefore the resulting schema does not represent V. Such a problem 
will not arise if every A G Rj is nonprime. 

2. The problem of determining whether a given attribute is prime is A^P— Complete, 
(problem [SR28] on p.232 of 0) 

3. |8] presents an algorithm to check whether or not a relation scheme is in 3NF. 
Normalization to 3A^F via decomposition is not practical but fact 2 gives the good 

news. 

Fact 2: For any universal schema S there exists a database schema V representing S. 
Moreover normalization through synthesis finds V efficiently as shown by P.A.Bernstein 
around 1976. 

The 3A^F synthesis algorithm can be described along the following lines. 

Algorithm 3NF{U, S) 
begin 

A ^ REDUCED{NONREDUND{C ANOXIC AL{U,T.))) 

X ^ KEY{U, A) 

for (each fd Y < — A E A) do 

Vi-VU{YA,nYAiA)) 
V^VU{X,(t)) 
end 
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Algorithm 3NF{U, S) first finds a reduced nonredundant canonical cover for S. Then 
for every fd it a relation scheme is created. Finally a key for U is added. It can be seen 
that the algorithm is efficient. The following theorem is stated without proof. 
Theorem 5: Algorithm 3NF{U, S) is correct i.e., V outputted by the algorithm repre- 
sents the input schema {U,T,). 

5 Concluding remarks 

Database theoreticians have pointed out that there is not much literature on the proper- 
ties of sets of fds. We refer to [2] for some modern approaches concerning the presentation 
of sets of fds based on which full knowledge on the validity of fds w.r.t. a stated context 
may be extracted. It is known that redundancy and potential inconsistency can also 
be present in certain relations in the absence of fds. A multivalued dependency (mvd) 
occurs when an attribute set conditionally defines a set of other attributes. Formally if 
XY C X multidetermines Y i.e., X Y if for every relation / on R, for all tuples 
u,v G I if u[X] = v[X] then there exists a tuple w E I such that w[X] = u[X] = v[X], 
w[Y] = u[Y], w[R — XY] = v[R — XY]. The notion of mvds is closely related to join. 
The fourth normal from {4NF), a generalization of BCNF requires that every mvd is 
due to keys. Let i? be a relation scheme and X,Y R (X 7^ 0, y 7^ 0) and let E be 
a set of fds and mvds that need to satisfied on legal relation on R. R is in ANF if for 
every mvd X —)■—)■ Y that is to hold on legal relations over R either the mvd is trivial 
i.e., y C X or XY = i? or X is a super key in the sense defined before. A theorem 
states that if R obeys only those fds and mvds that are logical consequences of a set of 
fds then ANF coincides with BCNF. The interaction between fds and mvds has been 
studied under a sound and complete formal system. 
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